NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FUDJ: Flexible User-Defined Distributed Joins

Sevim, Akil; Eldawy, Ahmed; Carman, Preston; Carey, Michael; Tsotras, Vassilis (May 2024, IEEE)

Join operations are crucial in data analysis, but can suffer inefficiency with large datasets and complex non- equality-based conditions. Optimized join algorithms have gained traction in database research to address these challenges. One popular choice for implementing join algorithms is distributed data processing frameworks, e.g., Hadoop and Spark, but each implementation is highly tailored for specific query types. As a result, they do not address join queries that involve diverse and complex conditions since they are not integrated into a holistic query optimization engine like in DBMSs. On the other hand, implementing new join algorithms on a DBMS from scratch requires substantial effort and expertise. This paper introduces FUDJ, Flexible User-defined Distributed Joins, a framework for complex distributed join algorithms. The key idea of FUDJ is to allow developers to realize new distributed join algorithms into the database without delving into the database internals. As shown, an algorithm implemented in FUDJ is up to an order of magnitude faster than existing user-defined implementations with an order of magnitude fewer lines of code.
more » « less
Full Text Available
FUDJ: Flexible User-Defined Distributed Joins

https://doi.org/10.1109/ICDE60146.2024.00320

Sevim, Akil; Eldawy, Ahmed; Carman, E Preston; Carey, Michael J; Tsotras, Vassilis J (May 2024, IEEE)

Join operations are crucial in data analysis, but can suffer inefficiency with large datasets and complex non-equality-based conditions. Optimized join algorithms have gained traction in database research to address these challenges. One popular choice for implementing join algorithms is distributed data processing frameworks, e.g., Hadoop and Spark, but each implementation is highly tailored for specific query types. As a result, they do not address join queries that involve diverse and complex conditions since they are not integrated into a holistic query optimization engine like in DBMSs. On the other hand, implementing new join algorithms on a DBMS from scratch requires substantial effort and expertise. This paper introduces FUDJ, Flexible User-defined Distributed Joins, a framework for complex distributed join algorithms. The key idea of FUDJ is to allow developers to realize new distributed join algorithms into the database without delving into the database internals. As shown, an algorithm implemented in FUDJ is up to an order of magnitude faster than existing user-defined implementations with an order of magnitude fewer lines of code.
more » « less
Full Text Available
HQ-Filter: Hierarchy-Aware Filter For Empty-Resulting Queries in Interactive Exploration

https://doi.org/10.1109/MDM52706.2021.00019

Sevim, Akil; Eldawy, Ahmed (June 2021, The IEEE International Conference on Mobile Data Management, MDM)
null (Ed.)
Modern visual data exploration systems are designed as client-server applications where the front-end interface generates a large number of queries to the back-end which are handled by a database server. As data exploration being a trial and error process, a significant amount of these queries return an empty result, which does not change the state of the visualization. These requests still add a significant overhead on network communication, request handling, and data processing. Moreover, given the virtually unlimited query space, it is impractical to enumerate and send all empty (or all non-empty) queries to the client to filter them. This paper introduces HQ-Filter, a hierarchy-aware filter for empty resulting queries, which utilizes the hierarchical nature of the data to construct a configurable and probabilistic filter. HQ-Filter can filter out empty-resulting queries at the client-side with a minimal size and processing overhead. HQ-Filter is applied to two existing data exploration systems for geospatial data, UCR-Star and Cloudberry. In both cases, it can successfully eliminate hundreds of queries per user which results in up-to 66% increase in server capacity by providing up to 15x speedup for average response time and up to 90% decrease in the server workload.
more » « less
Full Text Available
A Demonstration of Interactive Exploration of Big Geospatial Data on UCR-Star

https://doi.org/10.1145/3397536.3422334

Ghosh, Saheli; Sevim, Akil; Eldawy, Ahmed (November 2020, SIGSPATIAL '20: Proceedings of the 28th International Conference on Advances in Geographic Information Systems)
null (Ed.)
The ever rising volume of geospatial data is undeniable. So is the need to explore and analyze these datasets. However, these datasets vary widely in their size, coverage, and accuracy. Therefore, users need to assess these aspects of the data to choose the right dataset to use in their analysis. Unfortunately, all the publicly available repositories for geospatial datasets provide a list of datasets with some information about them with no way to explore the datasets beforehand. Through this demonstration, we propose the repository, UCR-Star, that is capable of hosting hundreds of thousands of geospatial datasets that a user can explore visually to judge their quality before even downloading them. This demo provides a deeper dive into the core engine behind UCR-Star. It provides a web interface geared towards database researchers to understand how the index internally works. It provides a comparison interface where the attendees can see side-by-side how two versions of the system work with the ability to customize each of them separately. Finally, the interface reports the response time of the indexes for a quantitative comparison.
more » « less
Full Text Available
Beast: Scalable Exploratory Analytics on Spatio-temporal Data

https://doi.org/10.1145/3459637.3481897

Eldawy, Ahmed; Hristidis, Vagelis; Ghosh, Saheli; Saeedan, Majid; Sevim, Akil; Siddique, A.B.; Singla, Samriddhi; Sivaram, Ganesh; Vu, Tin; Zhang, Yaming (October 2021, Conference on Information and Knowledge Management (CIKM))

Full Text Available
A brief introduction to geospatial big data analytics with apache AsterixDB

https://doi.org/10.1145/3486189.3490018

Sevim, Akil; Mahin, Mehnaz Tabassum; Vu, Tin; Maxon, Ian; Eldawy, Ahmed; Carey, Michael; Tsotras, Vassilis (January 2021, SIGSPATIAL/GIS)

Full Text Available

Search for: All records